A "not-so-shallow" parser for collocational analysis

نویسندگان

  • Roberto Basili
  • Maria Teresa Pazienza
  • Paola Velardi
چکیده

Abs t rac t . Collocational analysis is the basis of many s tud ies on lexical acquis i t ion . Collocations are extracted from corpora using more or less shallow processing techniques, that span from purely statistical methods to partial parsers. Our point is that, despite one of tile objectives of collocational analysis is to acquire high-coverage lexical data at low human cost, this is often not the case. Human work is in fact requ i red for the init ial t ra in ing of most s tat is t ical ly based methods. A more serious problem is that shallow processing techniques produce a noise that is not acceptable for a fully automated system. We propose in this paper a not-so-shallow parsing strategy that reliably detects binary and ternary relations among words. We show that adding more syntactic knowledge to the. recipe significantly improves the recall and precision of tile detected collocations, regardless of any subsequent statistical computation, while still nleet ing the cornputat ional requi, 'ements of corpus parsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Noun Phrase Parser of English

A noun phrase parser is useful for several purposes, e.g. for index term generation in an information retrieval application; for the extraction of collocational knowledge from large corpora for the development of computational tools for language analysis; for providing a shallow but accurately analysed input for a more ambitious parsing system; for the discovery of translation units, and so on....

متن کامل

Probabilistic Parsing of Korean Sentences Using Collocational Information

Lexical information is one of the most important source that can improve the accuracy of the syntactic disambigua-tion. This paper describes a Korean probabilistic parser that is based on the probabilities of phrase structure rules as well as the probabilities of collocational information between lexical items to resolve syntactic ambiguity. The proposed parser is shown by means of an extensive...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Sentence Analysis and Collocation Identification

Identifying collocations in a sentence, in order to ensure their proper processing in subsequent applications, and performing the syntactic analysis of the sentence are interrelated processes. Syntactic information is crucial for detecting collocations, and vice versa, collocational information is useful for parsing. This article describes an original approach in which collocations are identifi...

متن کامل

Multilingual collocation extraction with a syntactic parser

An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994